Skip to content

Conversation

@spock
Copy link

@spock spock commented Nov 13, 2023

to add support for e.g. Cyrillic characters in file names; untested, let's see if CI/CD passes :)

related to #13

@codecov
Copy link

codecov bot commented Nov 13, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

see 5 files with indirect coverage changes

📢 Thoughts on this report? Let us know!

@lrq3000
Copy link
Owner

lrq3000 commented Nov 13, 2023

Thank you very much for your PR! Unfortunately it fails to build, but there is hope, because more than half of the tests pass, so this suggests that with minor rewrites to the tests it may work fine again. I'll have a look.

@lrq3000
Copy link
Owner

lrq3000 commented Nov 20, 2024

I did not reply before but I did look into this and this involved more than minor rewrites for the remaining tests to pass.

I will eventually come back to it when I'll have time, and maybe AI is advanced enough now to actuallu be useful for this kind of bug fixing.

@lrq3000 lrq3000 requested a review from Copilot November 13, 2025 11:21
Copilot finished reviewing on behalf of lrq3000 November 13, 2025 11:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the character encoding from latin-1 to utf-8 in the b() compatibility function to enable support for Cyrillic and other non-ASCII characters in file names.

Key Changes:

  • Updated b() function to use utf_8_encode instead of latin_1_encode for Python 3 string-to-bytes conversion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def b(x):
if isinstance(x, _str):
return codecs.latin_1_encode(x)[0]
return codecs.utf_8_encode(x)[0]
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The encoding has been changed from latin-1 to utf-8 in the b() function, but the _bytes() function on line 66 still uses 'latin-1' encoding. For consistency and to properly support UTF-8 characters (like Cyrillic), this should also be updated to 'utf-8'.

Note that this change also requires updating corresponding decode('latin-1') calls elsewhere in the codebase (e.g., in header_ecc.py line 603 and structural_adaptive_ecc.py line 657) to use decode('utf-8') instead, to ensure proper round-trip encoding/decoding of file paths with non-ASCII characters.

Copilot uses AI. Check for mistakes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

@lrq3000
Copy link
Owner

lrq3000 commented Nov 13, 2025

@codex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants